feat!: integrate all vision models with vision camera#880
feat!: integrate all vision models with vision camera#880NorbertKlockiewicz merged 71 commits intomainfrom
Conversation
66e358e to
0a8493b
Compare
787ea7d to
7488857
Compare
35c2a3f to
a8d95a8
Compare
af8ebfe to
951ab91
Compare
|
On Android the output for front camera is upside down and mirrored. Is this an expected behaviour that will be fixed with orientation handling? |
yes, I am working on it now but this PR already has too much changes so it will be added in the next one. |
| EXPECT_NO_THROW((void)model.generateFromString(kValidTestImagePath, false)); | ||
| auto result1 = model.generateFromString(kValidTestImagePath, false); | ||
| auto result2 = model.generateFromString(kValidTestImagePath, false); | ||
| ASSERT_TRUE(std::holds_alternative<PixelDataResult>(result1)); | ||
| ASSERT_TRUE(std::holds_alternative<PixelDataResult>(result2)); | ||
| EXPECT_NE(std::get<PixelDataResult>(result1).dataPtr, nullptr); | ||
| EXPECT_NE(std::get<PixelDataResult>(result2).dataPtr, nullptr); |
| TEST(StyleTransferPixelTests, ValidPixelsSaveToFileTrueReturnsString) { | ||
| StyleTransfer model(kValidStyleTransferModelPath, nullptr); | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeRgbView(buf, 64, 64); | ||
| auto result = model.generateFromPixels(view, true); | ||
| EXPECT_TRUE(std::holds_alternative<std::string>(result)); | ||
| } | ||
|
|
||
| TEST(StyleTransferPixelTests, ValidPixelsSaveToFileTrueHasFileScheme) { | ||
| StyleTransfer model(kValidStyleTransferModelPath, nullptr); | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeRgbView(buf, 64, 64); | ||
| auto result = model.generateFromPixels(view, true); | ||
| ASSERT_TRUE(std::holds_alternative<std::string>(result)); | ||
| EXPECT_TRUE(std::get<std::string>(result).starts_with("file://")); | ||
| } |
There was a problem hiding this comment.
The first test is a subset of the second one
| TEST(PixelsToMatValidInput, ProducesCorrectRows) { | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeValidView(buf, 48, 64); | ||
| EXPECT_EQ(pixelsToMat(view).rows, 48); | ||
| } | ||
|
|
||
| TEST(PixelsToMatValidInput, ProducesCorrectCols) { | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeValidView(buf, 48, 64); | ||
| EXPECT_EQ(pixelsToMat(view).cols, 64); | ||
| } |
There was a problem hiding this comment.
Merge this two into one test
| TEST(PixelsToMatValidInput, ProducesThreeChannelMat) { | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeValidView(buf, 4, 4); | ||
| EXPECT_EQ(pixelsToMat(view).channels(), 3); | ||
| } | ||
|
|
||
| TEST(PixelsToMatValidInput, MatTypeIsCV_8UC3) { | ||
| std::vector<uint8_t> buf; | ||
| auto view = makeValidView(buf, 4, 4); | ||
| EXPECT_EQ(pixelsToMat(view).type(), CV_8UC3); | ||
| } |
There was a problem hiding this comment.
Again, merge and make two check in one test
e07a54c to
190a19b
Compare
|
Also get these warnings when running demo apps: |
|
Trying to run regular style transfer freezes the app on my android phone. This is probably related to: #962 so we can ignore it for now. |
chmjkb
left a comment
There was a problem hiding this comment.
left a couple of comments for now, though I haven't finished reviewing :D
| ## VisionCamera integration | ||
|
|
||
| For real-time object detection on camera frames, use `runOnFrame`. It runs synchronously on the JS worklet thread and returns `Detection[]`. | ||
|
|
||
| See the full guide: [VisionCamera Integration](./visioncamera-integration.md). |
There was a problem hiding this comment.
Honestly I think just a link to the vision camera integration doc would be enough. This way we don't have to remember to update these docs when changing return types. It's a general comment to all the changes in docs.
There was a problem hiding this comment.
When downloading the model i got the maximum state depth update error
There was a problem hiding this comment.
oh, on android or ios, also when downloading which model?
There was a problem hiding this comment.
iOS and I believe it was the selfie segmentation one
There was a problem hiding this comment.
I've tried to reproduce it in iPhone 16 Pro and SE 3 and wasn't able to, as it's just the example app I suggest to "jeździć obserwować"
| sizesArray.setValueAtIndex(runtime, 2, jsi::Value(4)); | ||
| obj.setProperty(runtime, "sizes", sizesArray); | ||
|
|
||
| obj.setProperty(runtime, "scalarType", jsi::Value(0)); |
There was a problem hiding this comment.
use ScalarType::Byte instead of plain magic 0
| }, | ||
| ], | ||
| 'camelcase': 'error', | ||
| 'camelcase': ['error', { properties: 'never' }], |
There was a problem hiding this comment.
Why are we ignoring this rule for properties?
| forward: (imageSource: string) => Promise<string>; | ||
| forward<O extends 'pixelData' | 'url' = 'pixelData'>( | ||
| input: string | PixelData, | ||
| output?: O |
There was a problem hiding this comment.
maybe outputKind / outputType would be a bit better
| * | ||
| * **Note**: For VisionCamera frame processing, use `runOnFrame` instead. | ||
| * | ||
| * @param input - Image source (string or PixelData object) |
There was a problem hiding this comment.
I think you can link to pixeldata here
| * | ||
| * Available after model is loaded (`isReady: true`). | ||
| * | ||
| * @example |
There was a problem hiding this comment.
Other models dont have such example. I think we should follow a single approach
| /** | ||
| * Given a model configs record (mapping model names to `{ labelMap }`) and a | ||
| * type `T` (either a model name key or a raw {@link LabelEnum}), resolves to | ||
| * the label map for that model or `T` itself. | ||
| * | ||
| * @internal | ||
| */ | ||
| export type ResolveLabels< | ||
| T, | ||
| Configs extends Record<string, { labelMap: LabelEnum }>, | ||
| > = T extends keyof Configs | ||
| ? Configs[T]['labelMap'] | ||
| : T extends LabelEnum | ||
| ? T | ||
| : never; | ||
|
|
There was a problem hiding this comment.
i think to keep types seperate this should be moved to some other file
| private constructor(nativeModule: unknown) { | ||
| super(); | ||
| this.nativeModule = nativeModule; | ||
| } |
There was a problem hiding this comment.
Is there a reason why we're ditching this? I feel like this is a cleaner way of creating models, where the factory is responsible for creating the native object, and the module just receives a ready to use thing.
There was a problem hiding this comment.
I've selected wrong version in rebase :/
| const instance = new ImageEmbeddingsModule(); | ||
| instance.nativeModule = await global.loadImageEmbeddings(paths[0]); | ||
| return instance; |
There was a problem hiding this comment.
same here regarding the construction method
There was a problem hiding this comment.
When can this happen? Maybe we should throw if it does?
if (!this.nativeModule?.generateFromFrame) {
return null;
}There was a problem hiding this comment.
It would skip inference if the model isn't loaded, however I think we can just check if (this.nativeModule == null)
There was a problem hiding this comment.
My point is that skipping inference silently here might be unexpected for the user.
| if (!this.nativeModule?.generateFromFrame) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Same concern as in VisionModule
| TEST(StyleTransferThreadSafetyTests, TwoConcurrentGeneratesDoNotCrash) { | ||
| StyleTransfer model(kValidStyleTransferModelPath, nullptr); | ||
| std::atomic<int32_t> successCount{0}; | ||
| std::atomic<int32_t> exceptionCount{0}; | ||
|
|
||
| auto task = [&]() { | ||
| try { | ||
| (void)model.generateFromString(kValidTestImagePath, false); | ||
| successCount++; | ||
| } catch (const RnExecutorchError &) { | ||
| exceptionCount++; | ||
| } | ||
| }; | ||
|
|
||
| std::thread a(task); | ||
| std::thread b(task); | ||
| a.join(); | ||
| b.join(); | ||
|
|
||
| EXPECT_EQ(successCount + exceptionCount, 2); | ||
| } | ||
|
|
||
| TEST(StyleTransferThreadSafetyTests, |
There was a problem hiding this comment.
I feel like this could be a typed test for all the models
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…meProcessor.cpp Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
b28847b to
836c7a2
Compare
| auto sizesArray = jsi::Array(runtime, 3); | ||
| sizesArray.setValueAtIndex(runtime, 0, jsi::Value(result.height)); | ||
| sizesArray.setValueAtIndex(runtime, 1, jsi::Value(result.width)); | ||
| sizesArray.setValueAtIndex(runtime, 2, jsi::Value(4)); |
There was a problem hiding this comment.
It's a number of channels but it should be passed the same way as the height and width, on it now!

Description
PixelDatainput support to all vision model hooks and modules —forward()now accepts both string URIs and raw pixel buffersrunOnFrameworklet to all vision model hooks for real-time VisionCamera v5 frame processing (useClassification,useImageEmbeddings,useOCR,useVerticalOCR,useObjectDetection,useSemanticSegmentation,useStyleTransfer)StyleTransferC++ to unifiedgenerateFromString/generateFromPixelswith asaveToFileflagPixelDatainput andrunOnFrameVisionModelTest(unit tests forextractFromPixelsandpreprocess),FrameProcessorTest(unit tests forpixelsToMat),StyleTransferTest(full coverage of both output modes), andgenerateFromPixelssmoke tests across all vision modelsBreaking changes
StyleTransferType.forwardreturn type changedBefore:
After:
Known Issues
The orientation handling is static right now, it will be fixed in the next PR.
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Screenshots
Related issues
Checklist
Additional notes